Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update web backend get connection to properly handle field selection and schema refresh #20323

Merged
merged 4 commits into from
Dec 11, 2022

Conversation

mfsiega-airbyte
Copy link
Contributor

What

Make the web backend handler properly handle field selection when there is a schema refresh.

How

Only include a selected field if:

is in the newly discovered schema AND it was either originally selected OR not in the originally discovered schema at all.

TODO before merging: add another unit test for the case where a field is removed in the newly-discovered schema.

Recommended reading order

Reading the new unit test might make the intended logic clearer, and therefore easier to review the actual code change.

@mfsiega-airbyte mfsiega-airbyte requested a review from a team as a code owner December 9, 2022 19:49
@octavia-squidington-iv octavia-squidington-iv added area/platform issues related to the platform area/server labels Dec 9, 2022
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 19:49 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 19:49 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 20:43 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 20:43 — with GitHub Actions Inactive
@@ -357,7 +360,7 @@ public WebBackendConnectionRead webBackendGetConnection(final WebBackendConnecti
connection.setStatus(refreshedCatalog.get().getConnectionStatus());
} else if (catalogUsedToMakeConfiguredCatalog.isPresent()) {
// reconstructs a full picture of the full schema at the time the catalog was configured.
syncCatalog = updateSchemaWithDiscovery(configuredCatalog, catalogUsedToMakeConfiguredCatalog.get());
syncCatalog = updateSchemaWithDiscovery(configuredCatalog, catalogUsedToMakeConfiguredCatalog.get(), catalogUsedToMakeConfiguredCatalog.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we just passing the same catalogUsedToMakeConfiguredCatalog.get() in twice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm, I think I see why but it's confusing, maybe in the case where there isn't a refreshed catalog, we should call a different, simpler method that takes advantage of the fact that the original discovered catalog is equal to the discovered catalog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can see how this is confusing.

It wouldn't actually simplify much to have two different methods - essentially it's mostly saying "reconstruct it by looking at what we persisted versus the latest catalog". This PR adds a third element: (1) what we persisted; (2) the whole catalog when we persisted it; (3) the latest catalog. It's just that in the case where the catalog hasn't been rediscovered, (2) and (3) are actually the same. But otherwise the logic stays the same.

I do agree that it stands out as weird to pass the same thing twice like that, so I added a helper method in the hopes of better readability.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Makes sense, the helper makes it much clearer

@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 21:56 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 21:57 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 22:09 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 22:10 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 23:20 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 9, 2022 23:20 — with GitHub Actions Inactive
originalDiscoveredStream.getStream().getJsonSchema().findPath("properties").fieldNames()
.forEachRemaining((name) -> originallyDiscovered.add(name));
stream.getJsonSchema().findPath("properties").fieldNames().forEachRemaining((name) -> refreshDiscovered.add(name));
// We include a selected field if it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great comments, would be hard to follow without them!

final WebBackendConnectionRead result = testWebBackendGetConnection(true, connectionRead,
operationReadList);

// We expect the discovered catalog with two fields selected: the one that was originally selected,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also really appreciate the comments describing how the test works!

Copy link
Contributor

@pmossman pmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Really helpful comments that help navigate the complexity. Also test cases look solid, great stuff!

@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 11, 2022 20:37 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte temporarily deployed to more-secrets December 11, 2022 20:37 — with GitHub Actions Inactive
@mfsiega-airbyte mfsiega-airbyte merged commit f05ac16 into master Dec 11, 2022
@mfsiega-airbyte mfsiega-airbyte deleted the msiega/column-selection-api-changes2 branch December 11, 2022 21:17
@marcelopio
Copy link
Contributor

marcelopio commented Dec 23, 2022

This change broke the "Refresh Schema" button for a lot of connections on my install.

2022-12-23 13:47:04 �[1;31mERROR�[m i.a.s.a.ApiHelper(execute):28 - Unexpected Exception
java.util.NoSuchElementException: No value present
	at java.util.Optional.get(Optional.java:143) ~[?:?]
	at io.airbyte.server.handlers.WebBackendConnectionsHandler.webBackendGetConnection(WebBackendConnectionsHandler.java:352) ~[io.airbyte-airbyte-server-0.40.26.jar:?]
	at io.airbyte.server.apis.WebBackendApiController.lambda$webBackendGetConnection$2(WebBackendApiController.java:51) ~[io.airbyte-airbyte-server-0.40.26.jar:?]
	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:18) ~[io.airbyte-airbyte-server-0.40.26.jar:?]
	at io.airbyte.server.apis.WebBackendApiController.webBackendGetConnection(WebBackendApiController.java:51) ~[io.airbyte-airbyte-server-0.40.26.jar:?]
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) ~[jersey-common-2.31.jar:?]
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680) ~[jersey-server-2.31.jar:?]
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) ~[jersey-container-servlet-core-2.31.jar:?]
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) ~[jersey-container-servlet-core-2.31.jar:?]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) ~[jersey-container-servlet-core-2.31.jar:?]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) ~[jersey-container-servlet-core-2.31.jar:?]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) ~[jersey-container-servlet-core-2.31.jar:?]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) ~[jetty-servlet-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:569) ~[jetty-servlet-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507) ~[jetty-servlet-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.Server.handle(Server.java:501) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) ~[jetty-server-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) ~[jetty-util-9.4.31.v20200723.jar:9.4.31.v20200723]
	at java.lang.Thread.run(Thread.java:1589) ~[?:?]

This is because of this comment, some lines below the change, the source_catalog_id is empty for those connections:

/*
* Diffing the catalog used to make the configured catalog gives us the clearest diff between the
* schema when the configured catalog was made and now. In the case where we do not have the
* original catalog used to make the configured catalog, we make due, but using the configured
* catalog itself. The drawback is that any stream that was not selected in the configured catalog
* but was present at time of configuration will appear in the diff as an added stream which is
* confusing. We need to figure out why source_catalog_id is not always populated in the db.
*/

There should be a check to see if catalogUsedToMakeConfiguredCatalog is also present here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants